Efficient Distribution Mining and Classification

نویسندگان

  • Yasushi Sakurai
  • Rosalynn Chong
  • Lei Li
  • Christos Faloutsos
چکیده

We define and solve the problem of “distribution classification”, and, in general, “distribution mining”. Given n distributions (i.e., clouds) of multi-dimensional points, we want to classify them into k classes, to find patterns, rules and out-lier clouds. For example, consider the 2-d case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution/cloud of 2-d points (one for each item he bought). We want to group similar users together, e.g., for market segmentation, anomaly/fraud detection. We propose D-Mine to achieve this goal. Our main contribution is Theorem 3.1, which shows how to use wavelets to speed up the cloud-similarity computations. Extensive experiments on both synthetic and real multidimensional data sets show that our method achieves up to 400 faster wall-clock time over the naive implementation, with comparable (and occasionally better) classification quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Data Mining with Evolutionary Algorithms for Cloud Computing Application

With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...

متن کامل

Improving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran

An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...

متن کامل

Resources classification using fractal modelling in Eastern Kahang Cu-Mo porphyry deposit, Central Iran

Resources/reserves classification is crucial for block model creation utilised in mine planning and feasibility study. Selection of estimation methods is an essential part of mineral exploration and mining activities. In other word, resources classification is an issue for mining companies, investors, financial institutions and authorities, but it remains subject to some confusion. The aim of t...

متن کامل

An Efficient Representation Model of Distance Distribution Between Two Uncertain Objects

In this paper, we consider the problem of efficient computation of distance distribution between two uncertain objects. It is important to many uncertain query evaluation (e.g., range queries, nearest-neighbour queries) and uncertain data mining (e.g., classification, clustering and outlier detection). However, existing approaches involve distance computations between samples of two objects, wh...

متن کامل

Evaluating the effect of using different reference spectra on SAM classification results: an implication for hydrothermal alteration mapping

This research was performed with the objective of evaluating the accuracy of spectral angle mapper (SAM) classification using different reference spectra. The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) digital images were applied in the SAM classification in order to map the distribution of hydrothermally altered rocks in the Kerman Cenozoic magmatic arc (KCMA), Iran...

متن کامل

Estimation of reliability-based maintenance time intervals of Load-Haul-Dumper in an underground coal mine

Reliability estimation plays a significant role in the performance assessment of mining equipment, and aids in designing efficient and effective preventive maintenance strategies. Continuous and random/irregular occurrence of failures in a system could be the main cause for performance drop of machinery. The accomplishment of a projected level of production is possible only by an efficient oper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008